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Abstract 

We introduee a simple method for nearly simultaneous eomputation of all mo¬ 
ments needed for quasi maximum likelihood estimation of parameters in dis- 
eretely observed stoehastie differential equations eommonly seen in finanee. The 
method proposed in this papers is not restrieted to any partieular dynamies of 
the differential equation and is virtually insensitive to the sampling interval. The 
key contribution of the paper is that computational complexity is sublinear in the 
number of observations as we compute all moments through a single operation. 
Furthermore, that operation can be done offline. The simulations show that the 
method is unbiased for all practical purposes for any sampling design, includ¬ 
ing random sampling, and that the computational cost is comparable (actually 
faster for moderate and large data sets) to the simple, often severely biased, Euler- 
Maruyama approximation. 

Keywords: Quasi likelihood. Diffusion process. Conditional moment. Maximum 
likelihood. Stochastic differential equation 
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1. Introduction 

Most applications such as simulation or estimation involving ltd stochastic 
differential equations (SDEs) are in one way or another linked to the transition 
probabilities of the process. For example, it would be straightforward to estimate 
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parameters using the maximum likelihood method if transition probability density 
was known, but this is rarely the ease in praetiee. 

However, it is often possible to approximate the transition probability density. 
The probability density was obtained by brute foree numerieal eomputation of 


the solution to the Fokker-Planek equation (a partial differential equation) in Lo 


oo 

00 

o^ 

Eindstrdrr 

1 (2007) while Monte Carlo based approaehes were proposed in 

Pedersen 

(1995b) 

Durham and Gallant 

(2002) 

Beskos et al. (2009); Eindstrom 

(2012b) and referenees therein. Those methods are eomputational expensive. 


making them unsuitable for large data sets. A Gauss-Hermite series expansion 
of the transition probability density was proposed by Ait-Sahalia (2002), although 
that approaeh is limited to models with a speeifie strueture. 

The reeent advanees in eolleeting and storing large amounts of data are shifting 
the foeus away from eomputationally slow but statistieally effieient maximum 
likelihood methods towards eomputationally faster, yet not quite as statistieally 
effieient quasi-maximum likelihood methods as the abundanee of data often more 
than makes up for the loss of effieieney. 

A simple approaeh based on the quasi maximum likelihood teehnique was in- 
trodueed inIFlorens-ZmirouM 19891) where the eonditional mean and varianee were 


obtained from an Euler-Maruyama diseretization of the model, ef. Kloeden and 


Platen (1992). This is very effieient from a eomputational point of view and it was 


shown in Florens-Zmirou ( 1989| ) that their method is equivalent to the maximum 
likelihood estimator as the sampling interval goes to zero, as the bias vanishes. A 


higher order version of this approaeh is proposed in 

Kessler 

(1997). Quasi max- 

imum likelihood methods are generally unbiased, see 

Sprensen 

(2012|) provided 

that the mean and varianee are eorreetly speeified. The bias in the Elorens-Zmirou 


(1989); Kessler (1997) methods therefore explieitly depends on the quality of the 
approximation of eonditional moments, ef. [Hook and Lindstrom ( 2014[ ). 

The purpose of this paper is to develop a eomputationally fast method quasi 
maximum likelihood estimator for diseretely observed diffusion proeesses that is 
suitable for moderate to large data sets. We show that the eomputational eost is 
sublinear rather than superlinear due to way the moments are eomputed. (our sim¬ 
ulations shows the eomputational eomplexity is eomparable to that of the Euler- 
Maruyama seheme, and henee magnitudes faster than any approximate maximum 
likelihood method). This will be aehieved without the bias problems assoeiated 
with the Euler-Maruyama method, a property that is virtually independent of the 
sampling interval. 

The outline of the paper is as follows. In Seetionj^we formulate the statis- 
tieal problem and diseuss some alternative teehniques for ealeulating eonditional 
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moments. This is followed by Seetionj^ where we present a numerieal implemen¬ 
tation that results in sublinear eomplexity. The resulting parameter estimation 
algorithm is demonstrated in Seetion on two qualitatively different diffusion 
proeesses as well a randomly sampled data followed by the eonelusions being 
drawn in Seetion [51 


2. Diffusion processes and conditional moments 

Let (fl, P, {Pt}t>o) be a filtered probability spaee and let Xt{9) be a stoehas- 
tie proeess defined on that spaee that solves the following one dimensional stoehas- 
tie differential equation (SDE) 

dXt = ae{Xt)dt + he{Xt)dWt, Xt^ = x. (1) 


We assume throughout the paper that the drift and diffusion terms are regular 
enough (e.g. bounded growth and loeal Lipsehitz, see jKaratzas and Shreve| ( |2012| ) 
for alternative eonditions) to ensure existenee and uniqueness of the solution. 

The optimal method for estimating the parameters, 6, is the maximum likeli¬ 
hood estimator. Let Xk = x{tk), k = 1,..., iT be observations generated from 
Eq ([T]). The maximum likelihood estimator is defined as 


Omle = argmaxf(6*) 
eee 


( 2 ) 


where the log-likelihood funetion is given by 

K 

^{0) = \ogpe{xo) + ^\ogpe{xk\xk-i)- (3) 

k=l 


The transition probability densities, pe{xk\xk-i) are implieitly defined by the 
model. The properties of the model are found by analysing the generator 

The transition probability pe{xk\xk-i) is the solution to the Eo kk er-Planck 
equation, which is defined from the adjoint operator (£m, v) = (m, C*v) under the 
inner product (■,•). The Eokker-Planck equation, when starting from Xk at time tk 
and ending at tk+i is given by 

d 

—pe{x,t) = -C*p0{x,t) (5) 
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with the initial condition p 0 (x\xk) = S{x — Xk). This initial condition is likely to 
cause problems for numerical implementations of Eq. @ due to discontinuity, see 


the implementation in Lo (19881 and the remedy proposed in Lindstrom (2007). 

Another method for computing the transition probability is to use the Markov 
property and law of total probability, adding and integrating out an intermediate 
state variable, see [Peders^ (| 1 995b|a]) Let tk-i < s <tk. It then holds that 


pe{xk\xk-i) = J pe{xk,Xs\xk-i)dxs 

= Ee \pe{xk\xs)\xk-i] 


( 6 ) 


Monte Carlo methods can easily approximate that expected value, but the use of 


variance reduction techniques is needed for most applications, cf. Durham and 
Gallant] ( [2002| ); [Lindstrom p012a| ). 

However, we cannot expect to be able to solve either the Lo kk er-Planck equa¬ 
tion Eq. (|^ or the conditional expectation in Eq. (|^ in closed form for more 
complex models. That means that the complexity of any of these approximate 
maximum likelihood method will be linear (in terms of expensive operations) in 
the number of observations. 


A possible remedy are the Gauss-Hermite, see Ait-Sahalia (2002), or saddle 
point, see [Ait-Sahalia and Yu| (|2006|) expansions. These can be very accurate 


for frequently sampled data but there are also important limitations. Such as the 
existence of the Lamperti transform, and as well as = tk — ffc-i being small 
as the error typically is O with L being the number of terms in the series 

expansion. A key operation is to employ a Lamperti transformation of the process 


Yt = g{Xt) 

such that the dynamics of Zt is given by 

dY^ = f{Y,)dt + dWt. 


(7) 


( 8 ) 


The transition probability in [Ait-Sahalia[ ( [2002[ ) is given by the Hermite series 

(9) 


approximation (here assuming that = A for all observations) 

'Vk - Vk-i 

j =0 

with the coefficients given by 


PYiVklVk-i) = A^/^ ' ) £7^77,• 


AV2 




H; 


Vk - Vk-i 
AV2 


\^n—l 


( 10 ) 
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where Hj is a Hermite polynomial of order j. It is worth noting that the series 
expansion will not eonverge for all distributions and that a finite expansion may 
be negative for some values. The latter ean be solved by eonsidering a series ex¬ 
pansion of the log-density, but that series may not integrate to unity. Still, the 
eomplexity is essentially sublinear as the major eomplexity, deriving the expan¬ 
sion, is performed only onee. 

A simpler alternative is to resort to a quasi maximum likelihood estimator, 
ef. [Godambe and Heyde| ( flOTO) ); |S0rensen| ( 2012] ); [Lindstrom et al!] ( |2015[ ). The 
downside is a loss of statistieal effieieney as the distribution of the Maximum 
likelihood estimate is given by 


y]v(0-0o) 


( 11 ) 


where do is the true parameter and Ij? is the Fisher information matrix defined as 


(If),, = E 


dOidOj 


logp(X|0o) 


( 12 ) 


or equivalently 


{I 


F)i,j 


= E 


7^ logp(X|0o))(7^ logp(X|0o)) 


•de^ 


'de. 


(13) 


whereas the distribution of the quasi maximum likelihood estimate is given by 

(14) 


where 


and 


yx (0 - 0o) 4 N{0, J-^JJ-^), 


(^)m = E 


dOjdO. 


log^(X|0o) 


(I)i.i = E 


((—logvh(X|0o))(7^1ogvl/(X|0o)) 




86 , 


with 4/ being a Gaussian density with loeation and seale parameters given by the 
eonditional mean and (eo)varianee. That eovarianee is always larger or equal to 
the varianee of the maximum likelihood estimate (this follows from the Cramer- 
Rao inequality), with the differenee typieally being rather small for nearly Gaus¬ 
sian models, ef. [Overbeek and Ryd5i](|1997|). 
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2.1. Conditional moments 

Parameter estimation using QML is a reason for bringing us to the topie of 
ealeulating eonditional moments of the stoehastie proeess. We will throughout 
this seetion assume that g{-) is a general funetion representing any eonditional 
moment of interest. The conditional moment is given by 


^e[9{Xk)\Xk-i = Xk] = / g{xk)pe{xk\xk-i)dxk, 


(15) 


where p 0 {xk\xk-i) is the conditional probability density. 

The discussion in the previous section illustrates why Eq. ( [T5] ) is intractable 
in the general case, which is why we have to resort to approximations. Most 
approximations of a conditional moment can be expressed as a weighted sum 


'E0[g{Xk)\Xk-i = Xk] ^ '^u:ig{ii). 


(16) 


This approximation includes techniques like Monte Carlo estimation and various 
deterministic quadrature rules, such as rectangular rule, the trapezoidal rule of the 
Gauss-Hermite quadrature. 

It is also possible to approximate that conditional expectation using an ltd - 
Taylor expansion. Assume that the function is + 1 times continuously differ¬ 
entiable. It then holds that 


A 


^0[g{Xk)\Xk-i = Xk]=Y,C^g{xk)^ + O{X]^-^^), Ak = tk-tk-i. (17) 


/=o 


Note that this expansion is not guaranteed to converge unless additional con¬ 
straints are imposed on X, see |Ait-Sahalia| ( [200^ for details, but it often works 
quite well for small time intervals as the leading error term is This type 

of approximation is used compute moments in the Hermite series expansion in 


Ait-Sahalia (2002). It may be necessary to iterate the ltd -Taylor expansion over 
a series of smaller steps At/m for sparsely sampled data, cf. Runge-Kutta and 
multistep methods, see |Kloeden and Platen ( 1992| ). 

An alternative, mentioned in the beginning, is to calculate conditional mo¬ 
ments using the generator. This will require us to solve one PDEs for each mo¬ 


ment. The Eeynman-Kac (E-K) formula, see Karatzas and Shreve (2012), estab¬ 
lishes the relation between conditional expectations and parabolic partial differ¬ 
ential equations. Specifically, let r G [tk-i,tk\ and define the conditional expec¬ 
tation 

u{x,t) = E0[g{Xk)\Xr = x] (18) 
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when the dynamies is given by Eq. 0- The solution to the expeetation is then 
given as the solution to 

-^u{x,t) =-Cu{x,t). (19) 

or 

where the operator C is defined in Eq. Q and the initial eondition is given by 

u{x,tk) = g{x). 

There are at least two advantages of solving the adjoint problem eompared to 
the Eokker-Planek equation. The first is that we only need to solve one single PDE 
for any number of observations, whieh should be eompared to the eomputational 
eomplexity of the Monte Carlo and quadrature methods where it is neeessary to 
propagate weights and/or partieles between eaeh observation. Seeondly, it is more 
robust from a numerieal point of view to solve the adjoint equation as it has a 
well posed initial eondition (the equation is solved baekwards in time) e.g. for the 
standard moments a polynomial, x^. 

We will later show that it is in faet enough to solve a single PDE regardless of 
the number of moments we are interested in. That makes the eomputational eom- 
plexity marginal eompared to that of a full blown approximate maximum likeli¬ 
hood estimator. 

We present a eoneeptual summary of the pros and eons of eaeh method respee- 
tively in TableWe have marked the Eokker-Planek method with a (*) sinee the 
performanee of this method for small A^. depends strongly in the type of initial 
eondition as deseribed earlier. 

Table 1: Eeasibility of different methods for parameter estimation. 


Method Small A Large A Small data set Large data set 


Euler-Maruyama 

Yes 

- 

Yes 

Yes 

Ito -Taylor (Eq. (fTTj)) 

Yes 

- 

Yes 

Yes 

Eokker-Planek 

* 

Yes 

Yes 

- 

Monte Carlo 

Yes 

Yes 

Yes 

- 

Hermite series 

Yes 

- 

Yes 

Yes 

Generator (E-K) 

Yes 

Yes 

Yes 

Yes 


In the following seetion we will present a numerieal approaeh to ealeulate the 
eonditional moments from the Kolmogorov-baekward equation. 
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3. Discretization of the Kolmogorov backward equation 


Solving the Kolmogorov backward equation numerically can be achieved with 
a large number of different methods. We have opted for a semi-discretization with 
central differences in space to achieve maximum simplicity and as we will later 
see also the possibility to reuse calculation. The backward equation Eq. ( [T9l ) is a 
Cauchy problem in the sense that it has a final condition defined by the conditional 
moment of interest, u{x, T) = g{x), but lacks boundary conditions. This is sim¬ 
ilar to many option pricing problems where it is common to impose a boundary 
condition from asymptotic expansion of the solution, cf. |von Sydow et al.| ( |2015[ ) 
for an overview of numerical techniques for computing option prices. For Eq. ( [T9l ) 
to be well-posed it is necessary to impose boundary conditions for certain values 
of the coefficients. The condition when this is necessary may be found from the 
Fichera function (here in one dimension). 


0,Af 

J^ich = 


1 d 


K 


( 20 ) 


where k = { — 1,1} are the boundary normals. Boundary conditions are not re¬ 


quired when J^ich > 0, Fichera (19561. As an example the Fichera condition 
for the Cox-Ingersoll-Ross model at Xq = 0 is given by ab > cr^/2 which is also 
known as the Feller condition. Under the assumption that the backward equation 
is well-posed without boundary conditions in the sense of positive Fichera, we still 
need to define some conditions for the boundary values for the semi-discretized 


system. In Ekstrom et al. (2009) it was suggested to calculate the boundary values 
by solving a simplified backward equation at the boundaries with finite differences 
defined on the internal node points. We generalize this approach here by solving 


the full equation Eq. (191 at the boundary using interior node points. The advan¬ 
tage of this approach is that it does not require a large solution domain and it does 
not introduce a right hand side vector in the algebraic system. This enables rapid 
calculation of the time integration of the solution which we will utilize. 

Turning to the discretization of the derivatives for the interior nodes. The first 
and second partial derivatives are approximated by second order central differ¬ 
ences with Un = u{xyam + nh), h being the distance between two nodes. The 
derivatives are then given by 


OUn 1 / , 

K+l - Wn-l) 


( 21 ) 
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and 


d^Un 

dx^ 


^2 (^ra+1 -|- M^—i) . 


( 22 ) 


with both approximations having errors of size 0{h?). Inserting the FD approxi¬ 
mations ([2T]and[22l) into Eq. (191 results in 

du 1 1 

i^^n) 2^ (^n+1 l) 2 ,h‘^ (^n-|-l 211^ -f Un—1^ ■ (23) 

At the boundaries 0, N we need to solve Eq. ( [T9{ ) with skewed finite differences. 
On the lower boundary we use the following scheme: 


and 


dUn 

dx 

d'^Un 

dx"^ 


1 

h 


-Uo 


2 ui + -U2 


— (2mo - 5mi + Au2 - U3). 


Similar approximations are used on the upper boundary, 


(24) 


(25) 


dUn ^ I - o I “ 

~ T 'K'^N — 2Un-1 + 7:Un-2 


dx h \ 2 


(26) 


and 


d'^Un 

dx‘^ 


{2un — Smat-i + 4MAr_2 — Un-s) ■ 


(27) 


Inserting these ED approximations (24 ^ into Eq. ( [T9| ) result in similarly equa¬ 
tions as Eq. ( [23| ). Our approximation to the Kolmogorov-backward equation after 
approximating the spatial operator is given by, 


dUn 

dr 


= AuJt) 


(28) 


where A is a banded matrix with the following elements. 


A _ 

2h 


bejxiY 

2^2 


4 _ be^XiY . _ 

1 -^ 2,2 ]- i 2 1 -^ 2 , 2—1 


_ aeixi) bg{xiY 


2h 


2/i2 


The first and last rows in A have extra nonzero columns from the extrapolated 
boundary equations. Now that we have discretized the spatial operator we turn 
to the time discretization. In this paper we will use the matrix exponential to 
propagate in time, see|Moler and Eoan|(|2003|). This is feasible due to the boundary 
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technique introduced earlier. To illustrate the benefit of our approach of including 
the boundary values in A we consider the case when standard boundary conditions 
e.g. Dirichlet are used. The semi-discretized system then become 


dUn 

dr 


= AuJt) 


(29) 


where b contains the boundary values (here assumed to be time independent). The 
general solution of Eq. (|29l) is given by 


b 

(30) 


Un{tk-i) = exp ^A(4_1 - 4) j Un{tk) + A ^ (^exp (^A(4_i - 4) j 


which is computationally more expensive then the solution of Eq. ( [28] ), 

Mn(4-i) = exp (A(4_1 - 4)) Un{tk). 


(31) 


Eurthermore another drawback with the classical boundary values is dim(A) S> 
dim (A) since it requires a larger solution domain to avoid boundary values in¬ 
fluencing the solution which is an additional computational cost. Returning to 
Eq.igg the approximation error to the analytical solution (in terms of an expo¬ 
nential map) is given by, 


u{x, 4_i) = exp (£(4-1 - 4)) 4) 

exp (A(4_i - 4)) Un{tk) = Un{tk-l) + 0{h^). 


(32) 

(33) 


Since A is not normal we might need to subiterate the solution in time for stabil¬ 
ity reasons. This is also required when we need to evaluate the solution at non- 
equidistant time intervals e.g. when the data is collected at random time instances. 

A subiterated solution is obtained from the following identity exp (Ar) = (exp (Ar/m))' 
where a typical good value for m can be found from the following condition 


II Ar/m|| < 1 given in Moler and Eoan (2003). 

We use cubic spline interpolation to compute the conditional expectation for 
values that are not part of the finite difference grid. The interpolation error due to 
the cubic splines are (9(h‘^) which is dominated by the finite difference error for a 
dense grid. 

The conditional mean and variance are accurately computed from the solution 
of the first moment 

u^'^\x,tk-i) = E[Afc|Afc_i = x] 

and second moment 


u 


( 2 ) 


(a;,4-i) = E[Xfc|Xfc_i = x]. 
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The conditional variance is then obtain through a combination of these 


Var[Xfc|X,_i = x] = t>[Xl\Xk-i = x] - t.^[Xu\Xk_^ = x] 

= u^‘^\x, 4_i) - 4-i)^- (34) 


3.1. Convergence 

The semi discretization does not introduce any errors due to the time inte¬ 
gration. However, the discretization of the derivatives and the interpolation do 
introduce errors. We can decompose the interpolated numerical solution 
by adding and subtracting the numerical solution without interpolation u and the 
true solution u. This leads to 


^interp ^ ^interp ± ^ ± ^ 
_ ^ inter p 


U — U + 

Interpolation, 0{h'^) Discretization, 0{h?) 


= U + 0{h^). 


(35) 


Hence, the interpolation error is dominated by the discretization error if a good 
interpolation method is used. That trivially implies that the conditional mean can 
be computed with arbitrary accuracy. 

Next, we find that the conditional variance is given by 

^{2),interp _ _ ^^(2) ^ ^ (36) 

= _ (^( 1))2 + ( 37 ) 

meaning that also the error in the conditional variance is controlled by the dense¬ 
ness of the finite difference grid, h. This means that both the error in the mean 
and covariance can be made arbitrarily small (we choose the design parameter h). 


leading to consistent estimates, cf. Sprensen (2012). This is in contrast to the ap¬ 


proximate QML estimators in Florens-Zmirou (1989) and Kessler (1997) where 
no refinement of the estimates are possible. 

The numerical quality of the method is benchmarked by comparing it against 
the conditional mean and variance of the Cox-Ingersoll-Ross (CIR) and the con¬ 
ditional mean of its inverse (iCIR). These models are defined by. 


dXt = a{b - Xt)dt + aXl'^^dWt 


dXe = 


aXt + (a^ — a6)X 


dt - aXl'^^dWt 


CIR 

iCIR 


(38) 

(39) 
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(a) Error in the conditional mean (b) Error in the conditional variance 


Figure 1: Absolute error for the CIR model compared between different methods 
for T = 1/6. The methods are the Euler-Maruyama scheme, truncated Ito -Taylor 
{k = 1) Eq. (17) and the proposed method. Note that the conditional var (right 
figure) is equal for the Euler-Maruyama and the ltd -Taylor while the mean (left 
figure) is not. The time T was selected such that all methods converged. The 
Euler-Maruyama and the generator approximation would perform even worse if 
larger T is used. 


with Xt = 1/Xi. That means that we can (at least conceptually) compute the 
transition probability density as well as moments for the iCIR process. 

Let Xn e {Xmin,..., x^ax} bc a grid with x^in = 0.05 and x^ax = 0.15 and 
tm E {0,..., T} where the final time T = 1/6. The conditional mean of the iCIR 
process is quite lengthy and involves a Gamma function and will not be expressed 
here, see Ahn and Gao| ( 19991. The absolute error between the proposed method, 
conditional moments using the ltd -Taylor expansion and the Euler-Maruyama 
scheme are compared to the exact moments for the CIR model in Eigure[^ 

The convergence of the numerical method is seen in Eigure where we have 
plotted the relative error (relative mean square error) as a function of spatial 
discretization for the CIR model using parameters = 2 ^'-^ - 1 with h = 


(^ma,3c 


J/iVx. 


4. Parameter estimation 

To test the applicability of the proposed framework we evaluate the quasi max¬ 
imum likelihood estimator on two diffusion models. The quasi maximum likeli- 
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Figure 2: Spatial relative mean square error of the conditional mean for the iCIR 
and CIR model and conditional variance for the CIR model. Here r = 1/6 and 
number of time steps are Nt = 1000. Parameters for this test was {a, b, a} = 
{15,3,2}. 


hood estimator is defined as 

K 


9 


argmax 

0e0 


log^ E9[Xt\Xk-l\,yi^rg[Xk\Xk-l]^ 

k=l 


(40) 


where ^ yxk, Eg[xk\xk-i],'VsLr 0 [xk\xk-i]j is the Gaussian density function with 

mean Eglxklxk-i] and variance VsLrglxklxk-i] computed with the new method. 
We compare the moments computed from the adjoint equation in Section with 
the Euler-Maruyama method and to the exact moments when they are known. 


4.1. Estimation on moderate data set 

We also consider the inverse CIR model commonly used in interest rate model¬ 
ing, see Eq. (39). This model (or actually a simplification of it) was the preferred 
model in for US interest rate data in the likelihood based analysis in [Durham 


(2003). The model is challenging as the drift is non-linear but it can be shown that 
the conditional moments can be calculated analytically. The test data was gen¬ 
erated from the inverse CIR model using monthly time steps At = 1/12. Using 


13 












Xq = b initial value, we generated N = 1000 observations using the parameter 
{a, 6, a} = {15, 3, 2} . The first 100 observations were then disearded as bumin, 
leaving us with 900 observations. This was repeated 100 times in order to evaluate 
the estimators on independent data sets. 

The estimation was eondueted within the quasi maximum likelihood frame¬ 
work on an Intel® Core i5 @ 2.5 Ghz with 8GB of RAM. As an optimizer we 
used the standard Nelder-Mead, (fminseareh) in Matlab® (R2014b) with initial 
guess (10, 5,1}. The results for the iCIR proeess is presented in Figure where 
we see that the proposed method is unbiased whereas the Euler-Maruyama as well 


as the Durham-Gallant, see Durham and Gallant (2002), approximate maximum 


likelihood estimator are biased (the latter is due to insuffieient imputation in the 
time domain). We also see that the proposed method is as virtually as good as 
the maximum likelihood estimator based on the elosed form transition probability 
density. We also note that the Euler-Maruyama is still worse than the proposed 
method even when a more densely sampled data set is available for that estimator. 

4.2. Estimation on randomly sampled data sets 

To further test the applieability of the proposed method we estimate parame¬ 
ters on a simulated data set from a CIR proeess with samples arriving at random 


times, ef. Ait-Sahalia and Mykland (20031. This would be a ehallenging problem 


for a diserete time model, but ean readily be handled with a eontinuous time model 
within the proposed framework. We have simulated 1000 observations from the 


CIR proeess, ef. Eq. ( [38] ) with the same parameters and bumin as for the iCIR 
proeess with random time intervals. The time intervals are uniformly distributed 
tk — tk-i ~ U{\1/ 252,1/6]). The results when estimating the parameters with the 
proposed method, Durham-Gallant method and the Euler-Maruyama is presented 
in Eigure 1^ (note that we only present the a and a parameters as the b essentially 
is given by the uneonditional mean of the data). It ean be seen that the proposed 
method is unbiased even for randomly sampled data, but also that the varianee 
for the proposed method is worse than that of the maximum likelihood estimator 
based on the analytieal transition probability density. This is in line with our in¬ 
tuition as the transition density will beeome less Gaussian for sparsely sampled 
data, making the MEE the preferred estimator in that situation. 

4.3. Estimation on large data sets 

The ehallenge of large data sets is to develop fast methods. The Euler-Maruyama 
method is very fast (moments are given in elosed form) but requires At —)■ 0 and 
AAf —>■ oo for eonsisteney, of. Sprensen (2012|). The proposed method oomputes 
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Figure 3: Estimated parameters for the iCIR model using an analytical expression 
for the likelihood function, the Euler Maruyama approximation, the Durham- 
Gallant method, our proposed method and an Euler Maruyama estimator using 
more densely sampled data. The data sampling interval was At = 1/12 for the 
iCIR model. Since this time step resulted in a large bias in the Euler-Maruyama 
method; We also ran it on a data set with At = 1/52, dashed boxes in the Eig- 
ure. Eor this dense data set the Euler-Maruyama method performed better, but still 
suffers from bias. 
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Figure 4: Estimated parameters for the CIR model with randomly sampled data 
using an analytical expression for the likelihood function, the Euler Maruyama 
approximation, the Durham-Gallant method and our proposed method. The data 
sampling interval was randomly distributed as At t;(|l/252.1/6]). 


practically unbiased estimates, cf. Figures [^and Q for any sampling interval when 
the finite difference grid is dense enough, and will therefore only require K -f- oo 
for consistency. 

Here we evaluate the computational performance when computing the quasi 
likelihood function when the data consists of iT = 2 000 000 observations, which 
makes the data set computationally infeasible for most other estimators. The pa¬ 
rameters and sampling is the same as in Section 4.1 We present the time needed 
to compute the quasi likelihood function for an increasing set of observations 
(each point in the graph represents 1000 additional observations) for the Euler- 
Maruyama and the proposed method in Figure]^ The plot is based on the average 
taken over three simulations. The proposed method is initially more expensive 
than the Euler-Maruyama as we need to compute one matrix exponential to obtain 
the moments. However, the cost after the initial computation scales similarly as 
for Euler-Maruyama (it is actually somewhat cheaper for many observations), in 
spite that our method is consistent while the Euler-Maruyama is severely biased. 
This result is very encouraging as we do not need to have frequently sampled data 
for consistency meaning that we can work with data sets sampled over longer time 
horizons at the same computational cost, meaning that we could estimate certain 
(typically drift parameters) much better than what would be possible using only 
the Euler-Maruyama or similar algorithms. 
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Figure 5: Computational time (wall-eloek time) averaged over three eonseeutive 
measurements as a funetion of the data set size. 


17 









5. Conclusions 


This paper introduces a framework for computing conditional moments for 
diffusion models based on numerical computation of the Kolmogorov-backward 
equation which is the adjoint to the Fokker-Planck equation with exact integration 
in the time domain. The numerical solution is very accurate compared to standard 
methods for computing conditional moments. We used the computed moments 
in this paper to form a quasi maximum likelihood function for parameter estima¬ 
tion. The method is computationally very fast, as the complexity is sublinear. All 
that is needed is to compute a single matrix exponential to compute all moments, 
regardless of the number of observations. This makes the method well suited 
for parameter estimation of large data sets, which was confirmed by Figure in 
stark contrast to many approximate maximum likelihood methods for which the 
computational complexity typically is superlinear in the number of observations. 
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